Film verses Digital

The question of film verses digital image quality has been debated many times and I expect that the debate will continue for many more decades. This article is merely an opportunity for me to express my own opinions on the subject. As they say “where fools rush in” here goes…

The premise of most arguments is resolution, that is the ability to discriminate small details in the image. Film affectionados assume that film emulsions mean that film images are analogue while digital images consist of only discrete values. Neither assumption is totally correct.

Let’s start with an elementary review of a digital image format.

The Digital Image

Figure 1

This illustration is not to scale. Each pixel can only hold one tone. So the grid shown is conceptual only.

Mega Pixels is the most commonly used metric of digital quality. It does not tell the whole story. It does give us a count of discrete elements in a image but it does not tell us anything about the size of the image. For that we need pixels per inch (PPI), or the pixel density.

The next metric is the bit-depth. This tells us how many discrete tonal differences can be represented in a pixel. Of course each pixel represents only one of these tones. More bits per pixel results in more tones and larger file sizes (Mega Bytes). Don’t confuse bit-depth with dynamic range. Dynamic range defines the difference between the darkest and lightest tones captured from a scene. Bit-depth defines the granularity of the individual tones in between. If dynamic range was a ladder, bit-depth would define the number of steps on the ladder, not its height. Dynamic range would describe the height of the ladder.

Before the image is stored on some removable media it may be compressed. So the file size is not a reliable indicator of the image quality.

Digital Sensors

Figure 2

Digital sensors consist of many sensor sites that capture information about the light falling on them. The light consists of only two metrics, the wavelength which determines the color and the intensity which determines the brightness. Together, these represent a tone in the visible spectrum.

A color sensor site consists of three or more sensor recptors also called sensor elements. Each sensor recptor is a unique photodiode. Photodiodes cannot record the wavelength (color). So they are covered with a filter that reduces the sensitivity to a single portion of the spectrum. Each recptor records the intensity for a different range of wavelengths. The physical arrangement of these recptors and the filters used accommodate different designs. There are also differences in the electronics such as CMOS and CCD, but they are not important at this point in the discussion.

The most prevalent design for consumer cameras is the Bayer layout of RGGB sensor recptors. This has redundant green recptors. Since the eye is most sensitive to subtle tonal differences in green and green occupies a fairly wide portion of the spectrum, this redundancy is not a bad thing. And, the sensor site results in a square pixel. Sony has a modification to this layout which substitutes a cyan filter for one of the green filters.

A tri-linear arrangement has one each RGB recptor at each site. This results in a rectangular site, so interpolation is required in either the horizontal or vertical direction to create a square pixel.

The Foveon site is arranged in layers. The top layer captures the blue wavelengths. The middle layer captures the green wavelengths. The bottom layer captures the red wavelengths. This layout results in a square pixel. In theory, the site size can be much smaller, but in practice they are usually larger. This design is also less sensitive to artifacts from the spread function of light as it passes through the lens to the image plane. To see more on the Foveon layout, click here.

Fuji uses hexagonal sensor recptors arranged similar to the Bayer pattern but rotated 45°. Fuji has not published a clear diagram of the physical layout, so this representation cannot be verified. Their newest design (CCD SR) includes an additional recptor dedicated to capturing highlights. This recptor is not color filtered and has lower sensitivity.

There are also beam spliting configurations with prisms and multiple sensors. These can present some unique optical challanges. And there are shift and scan configurations where the sensor itself moves while multiple image samples are taken. These tend to require significantly longer exposure times.

To my knowledge no one has designed a digital camera sensor to capture simply the wavelength and intensity, similar to capturing the wavelength and volume of an audio signal. Such a device already exists. It is called a spectrometer. These are used in astronomy to examine the chemical composition of distant objects in space. They are also sometimes used to measure the light reflected off papers or dyes, and for film and video display measurements. I should also note that there are some very high-end cameras that actually shift the entire sensor plane while taking three separate exposures. And others that scan the image one line at a time. These are rather slow but they can produce extremely high resolution and accuracy.

These alternate designs are interesting and they all have their merits and de-merits. There are also various optical filters that may be in front of the sensors. Some examples are low-pass, infrared blocking, and micro lenses to refocus or intentionally blur the light rays. My objective is not to evaluate these designs but to set the stage for the next steps in the formation of the actual image pixels.

The First Steps

The first thing that needs to be done in the formation of an image is to transform the data recorded by the individual receptors into something that accurately represents the full spectrum of light. Here the terminology is very confusing. Sometimes it seems to be on purpose. As we’ve said in the computer industry, “Complexity keeps the riff-raff out of the business”. The two most common terms used are demosaicing and colorimetric conversion. It would be better if demosiacing was only used to refer to algorithms that are used to increase the apparent resolution of low-resolution sensors. Colorimetric conversion is a better term, but it has two meanings. One relates to the measurement of tones with light sensitive meters and the other relates tones to the human visual response. In the first steps of image processing, the visual response is not important, the sensor response is.

The sensor receptors have recorded the intensity of several sampled wavelengths. These each cover a rather broad range (they are not very granular). These are measurements of light intensity, not photographic RGB numbers. The actual encoding depends on how many and what type of sensor receptors there are at a single site and the circuit sensitivity curves of the individual receptors. Though the algorithms vary, all digital sensors need these conversions.

This process is basically taking two variables (wavelength range and intensity) from each sensor channel and converting them into three new variables (RGB) that more accurately represent a specific hue, the corresponding saturation, and the brilliance. These RGB values reconstruct tones such as yellow, which were not directly recorded. To do this properly you need to recognize that each range of wavelengths responds to light intensity differently and that each filter or sensor receptor responds differently. This is sometimes referred to as the linear space since the linear channel values (voltage) are assigned to linear spectrographic values.

This resulting RGB tone is a pixel, from a single sensor site. Before this step, they are sensor elements, not pixels. The accepted and better term for a sensor site is “sel”.

Demosaicing

The dictionary defines mosaic as (abbreviated, verb): to make a picture with small bits of colored tiles. So, to demosaic would be to remove the tiles and replace them with tones. Demosaicing algorithms may be designed to fix or invent tile values that might be missed with a high-resolution image on a lower resolution sensor site. Just as often, these will soften the image and produce other artifacts. They are useful in your mobile phone camera, but dangerous in your high-resolution SLR camera. Sometimes, colorimetric conversion is included in the description of demosaicing. This is where the confusion comes from.

The Color Space

These RGB values do not represent colors that we can see until they are mapped into a color space. This is identified by a color profile. This process maps the brightness and tonal values to the visual response and determines the gamut of the colors. This is where gamma correction occurs. Mathematically, this is another conversion. Now the color values have to represent something that can be reproduced within the gamut of the target color space. Since the gamut of the color space represents the response curves of human vision. this is often referenced as a gamma corrected or colorimetric color space.

Without understanding these important but different concepts, the terminology can be very confusing. Colorimetric conversion, demosaicing, and gamma corrections for visual response have unique and different functions that are performed to create the final RGB values.

The Rest

The next step is interpolation. This is necessary if the sensor site is not square since image editing software and display devices usually assume square pixels. Otherwise it is optional but sometimes employed by in camera processing to make larger or smaller images. Some implementations of the Bayer pattern will use two sensor sites to create three pixels. Whenever and wherever interpolation is performed some data is invented or some data is lost.

There are two factors that contribute to image aliasing, so the next step is anti-aliasing. One factor is the spatial sampling of the image data. This is more descriptively referred to as moiré. Another factor is that low resolution images with diagonal or curved edges can appear to look jagged. This form of aliasing is usually much more important in graphic images and text than in photographic images, unless they have been interpolated. Anti-aliasing is sometimes performed by examining similar elements in neighboring sensor sites and smoothing the tonal transitions. This has the effect of emulating a higher sampling frequency. The basic process always is to replace some pixels along the edge with a tone between the two contrasting tones that define the edge. Not surprisingly, this can sometimes lead to soft digital images.

There are several additional important processes that are typically performed on the image data. One is to achieve the correct color balance that matches the light source. Others include sharpening and noise reduction. There are many variations on these. The objective is always to enhance the image quality. It is up to you to evaluate the effectiveness of a manufacture’s efforts.

Digital Raw Files:

With what are known as digital RAW files, the sensor data is simply recorded as normalized voltage values from the individual sensors. The image data is not recorded as RGB tonal values in any given color space. All processing is left to the editor software that opens the RAW file.

Film verses Digital: Similarities and differences

Film records a latent image in clumps (grain) of silver crystals and dyes on layers of emulsions. Each clump is analogous to a pixel and each dye layer is analogous to the bit depth. Some films have smaller grains and/or more dye layers than others.

The first metric for this discussion is pixel size at the sensor compared to film grain size. Film grain size is hard to quantify. The silver halide crystals are as small as 2 um (microns or micrometers). But graininess is determined by random clumps of these crystals and dyes. The sizes quoted for these are in the range of 6-8 um at the smallest. This varies with different films and ISO speeds of course. The ISO 800 average would be closer to 17 um or larger. Anyway, this is primarily where the quote of 11-13 million pixels for digital to film equality comes from. The D100 pixel size is 7.6 um. But, the sensor itself is only 2/3 the size of 35mm film. If it were full 35mm format the megapixel rating would be about 14.

Converting these metrics to pixels per inch (PPI) provides some more insight. A top quality film negative with 6 um grain would be 4233 PPI. The Nikon D100 sensor delivers 3333 PPI. Film scanners range from 1200 to 4000 PPI. Print scanners typically range from 300 to 1200 PPI. The point is that the density of the image details is closer than one would think.

Just for grins, the retinal sensors in the eyeball are about 5 um in size. There are 100 million of them, but only about 100,000 are sensitive to color. Of course, this image is never magnified, the sensor is spherical instead of flat, and focusing and color recognition are concentrated in a circular area only 1.5 mm in diameter. No comparison is really possible except to note that the equivalent pixel size is only slightly smaller than either film or digital.

Another metric is MTF. This measures the ability to detect line pairs (lp/mm) in an image. It is very dependent on the contrast available in the image. It is most often used to compare one lens against another or one film against another. For reference, the human eye has been quoted at 6 lp/mm at 2.5 cm and 1 lp/mm at 3.5 m. Lenses and film generally start the high-end measurements at 40 lp/mm. The film measurements generally quote the best numbers at contrast ratios of 1000:1 (unrealistic). A digital sensor’s MTF is limited by the size of two adjacent pixels. For a D100 this would be 66 lp/mm. Whether or not the adjacent pixels would be able to detect the lines would be a function of the contrast and pixel sensitivity. One again, the metrics are not differing by orders of magnitude. And the MTF of the image is ultimately a product of the MTF’s of each of the elements in the system including the lens, the printer, and the ultimate enlargement. The important point is that a high-density digital sensor should not be the limiting factor in image resolution.

Dynamic Range and Contrast

The next important metric would be the dynamic range or the ability to detect and discriminate contrast. The eye can resolve about 7 to 10 stops of light (contrast) at a single glance. But a sun and shade daylight scene can easily contain 15 stops of light. So even with the eye, dark and light tones can wind up compressed with loss of detail. The eye can adjust from darkness to a bright scene in about 5 minutes. It takes up to 30 minutes to fully adjust from strong light to darkness. We do the same thing with film or digital images by varying the exposure. And we see the same artifacts, blown highlights or loss of shadow detail.

Figure 3

This is a contrived image intended only to illustrate the concept of dynamic range. It is possible to see the moon and the sun in the same sky, because they fit within the dynamic range of our vision. It is impossible to see the stars behind the moon in daylight because the sun has overwhelmed any contrast. The earth bound objects we see are illuminated by the sun. This city shoreline had an exposure value of approximately EV +15. The starry sky would be approximately EV -6. The moon itself is about EV+14, but it would only illuminate this shoreline at about EV -2. The direct noon sun itself is at least EV +22. (Please don't take your digital camera and point it directly at the sun, you can actually damage the sensor.) The dynamic range of this scene (if it could exist) would be about EV 28. No single film, sensor, or eyeball can take it all in at one view.

With prints, film, and images we can only measure the contrast range of the medium. Digitally, white is 0 and black is 255 (8-bit). How faithfully these are reproduced on paper is a printing matter. So, the dynamic range describes how well the media can capture extreme tonal ranges in a scene. To quantify this we need accurate measurements of the scene and accurate measurements of the resulting image. Ansel Adam's zone system assumes a maximum range of eleven stops.

The dynamic range of a device is the difference (contrast) between the minimum and maximum signal it can faithfully record. It is sometimes expressed as a ratio between the minimum and maximum radiance (decibels). Image density is frequently expressed as brightness measured with a densiometer on a logarithmic scale of 0 to 4. A density of 3.0 is 10 times greater intensity than a density of 2.0. A contrast range of 100:1 is a density range of 2.0, and 1000:1 is a range of 3.0.

Expressed as the density range the numbers typically quoted are; prints 1.7-2.0, negative film 2.4-2.8, slide film 3.2-3.4, digital (8 bit) 2.4 and digital (12 bit) 3.6. For photographic discussions the dynamic range is usually expressed as zones or exposure stops. Expressed as exposure stops typical quoted values are: slides 5-6, negative film 8-10, black and white film 15, and digital at 8. There seem to be some discrepancies in these quotes.

Some folks claim that negative film has both greater dynamic range and more latitude than digital. I disagree, especially if you shot in the RAW formats. When you open a RAW image you typically have an option to decrease exposure by two stops or increase it by four stops. This is similar to push/pull processing during negative development. But there are other significant differences. One is that once the film has been developed, it is cooked. There is no opportunity to try the development again. Obviously with digital RAW that is not the case unless you overwrite the RAW image. The second is that with digital images blown highlights cannot be recovered while with film lost shadow detail cannot be recovered. They both suffer similar effects, just at different ends of the scale. With film it is common to tweak the contrast and tonal balance during the print processing. With digital we perform the same steps with editing programs with much more control and ease.

The electronic design and photon sensitivity of the individual sensors will affect the ability to faithfully capture tonal information. Similar considerations exist for film emulsions. Naturally folks can have favorite films and there are plenty of pros and cons in the debate between CCD and CMOS technologies. An objective measurement between the media types is very difficult simply because the comparative objective data is not generally available. So we are forced to rely on subjective evaluations.

If the latitude is the tolerance to exposure errors, the difference between film and digital is in the highlights and shadows. Film is less tolerant to under-exposure in the shadows while electronic sensors are less tolerant to over-exposure in the highlights. In both cases this clipping is an analog property.

The analog properties of the electronic sensor or film chemicals determine the dynamic range. For digital sensors the bit-depth of the sensor determines how much of this is captured and how many unique tones can be preserved. In black and white terms an 8-bit sensor can only record 256 tones. A 12-bit sensor can record 4,096 tones. So there are advantages to higher bit-depths, but they are not related to dynamic range. Clipping occurs first at the analog stage before any limits imposed by digitizing the data. Even the human eye has limits to it’s dynamic range, night vision verses daylight vision.

Film suffers from reciprocity failures at very short or long exposure times. Faster film speeds suffer from color de-saturation and noise. Digital sensors suffer from electronic saturation at long exposure times and noise at higher sensitivities (ISO settings).

Common measurements for film include the resolving power (lp/mm) at maximum and average contrast levels, spectral density, and color density. Digital cameras offer only pixel counts and bit-depths. The pixel count is not a definitive measure of resolution. The bit-depth does not measure spectral or color response, only the number of tones that can be recorded. When this kind of data is published for high-end digital cameras better comparisons will be possible.

Diffraction Limits

There is a point where higher resolution will not improve image quality. This is known as the diffraction limit. Diffraction is an inescapable property of light and optics related to the size of the aperture (lens opening). As light passes through this opening it bends slightly at the edges. The smaller the opening, the greater the effect. This causes an otherwise sharply focused image to become blurred.

I constructed the following chart to illustrate the resolution characteristics and the limiting factors. The vertical scale is not linear above 120 to fit the range of the diffraction limit curve. I have expressed these blur circles as resolution in line pairs per millimeter (lp/mm). The horizontal axis shows the aperture settings.

Figure 4

I have also included the curve of a theoretical lens showing resolution at a constant (max) contrast as the f-stop is changed. Unfortunately most MTF charts for lenses show constant resolutions at varying contrast levels but only at the minimum aperture and at f/8. For this comparison, the resolution as a function of the aperture is desired.

The band of resolution factors for film and digital sensors is also shown though this does not change with aperture. A 6um digital pixel is most similar to ISO 50-100 film and a 12um pixel is more similar to some ISO 800+ films. Most medium format digital backs use 9um pixel sizes.

The maximum resolution of the system is a function of the limits of the individual components. There are some that believe this can be mathematically calculated via root mean square (RMS) formulas. I do not agree since this yields an average weighted value rather than a limit biased value. In other words, the system resolution cannot be better than the lowest component resolution.

The bottom line is that the sharpest possible image will be in an area of this graph that falls under the limiting factors. This shows clearly why the “sweet spot” is said to be between f/5.6 and f/11.

The lines at the bottom of the chart show the CoC for various film and sensor formats again as resolution (lp/mm). These are not physical limits as with the previous metrics but subjective limits based on image resolution objectives for depth of field. This clearly demonstrates why larger image formats yield sharper images at higher f-stops even though the diffraction limits are the same and the lens limits are similar. Thus the related guideline that DX sensors are diffraction limited at f/16 while 35mm formats are diffraction limited at f/22.

If you want to read more about the Circle of Confusion and diffraction limits click here.

Conclusions

To get an objective answer to the film versus digital question you need to measure a specific film against a specific digital sensor. In most cases the answers will be so close that they are insignificant. There are many other factors including the lens and optical properties that will make the difference in the final judgement of the image quality. Thus, my firm belief is that the debate is over and the race is a tie.

Smaller pixels will be possible in the future. Research and applications for nano-technology are already in process. Diffraction limits on the other hand are properties of light. Any breakthrough will have to be in the field of optics. It will have an impact as significant as the invention of the telescope. By that time we will be on the frontier of pico-technology. Peda pixels will be in vogue and we will store images in pedafiles. In the meantime, there is little practical advantage in smaller pixels for serious photography. There are practical advantages for larger image formats.

Smaller sensor formats do have advantages for journalism, sports, and wildlife photography. Larger sensor formats have advantages for portrait, landscape, and artistic photography.

If the debate over image quality is over, that leaves economics and utility functions to be evaluated. The camera costs are just as variable in either media. You get what you pay for. Some attempt to compare the cost of film to the costs of a computer and software. One is capital equipment and one is supplies. Some try to compare lab costs to the user’s time investment. In fact either media can be processed by a lab or by the photographer. If you want to shoot digital but don’t want to invest in the equipment, training, or time to do the digital processing, just shoot JPG and take your images directly to your local super store.

In any comparison of utility, ease of use or degree of control, digital wins hands down. Press a button and those embarrassing shots disappear. You get instant feedback to see if you captured the scene that you wanted and if the quality is at least acceptable. You don’t have to change the film if the lighting has changed and you need more speed or a different white balance. If you want point and shoot ease, just shoot JPG images and let the camera do all the lab processing. If you want total control over the image processing, just shoot in RAW mode and do the lab work yourself with a digital editor.

A single digital media card can hold the equivalent of ten or more rolls of film. The image can be easily used for email to friends and family, for high quality small prints at home or a local lab, and for substantial enlargements at a quality lab.

If you are unhappy with the colors from film, you just try another film. If you are unhappy with the colors from digital you need to try a different sensor. That means buy another camera. With film any dust in the image is purged when you advance the frame or change the plate. With digital you need to clean the sensor occasionally. To some, this is brain surgery.

An image can be judged on its artistic or technical merits. There are three broad categories of technical quality. These are sharpness, fidelity, and noise. Sharpness is a subjective criteria but it can be objectively measured in terms of contrast and resolution. Fidelity is an assessment of faithful recording of luminosity and color. It can be evaluated with metrics such as spectral and color curves, and dynamic or tonal range. Noise is a broad category of artifacts that were not in the original scene but got recorded in the resulting image. Film grain, electrical noise, reciprocity failures, and lens artifacts fall into this category.

It is the emotional impact and artistic quality that sells an image.

That is just my two cents. I hope you also gained some new insight from this article. If you have any comments, or suggestions, I would welcome your input. Please send me an email.

Rags Gardner
Rags Int., Inc.
204 Trailwood Drive
Euless, TX 76039
(817) 267-2554
rags@compuserve.com
www.rags-int-inc.com
January 30, 2004

This Page was last updated on: Mon, 06 Sep 2004 12:59:35 Central Daylight Time.
You are visitor number 659 since 02/04/04